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ABSTRACT 

Homing endonucleases (HEs) promote the evolu- 
tionary persistence of selfish DNA elements by 
catalyzing element lateral transfer into new host or- 
ganisms. The high site specificity of this lateral 
transfer reaction, termed homing, reflects both the 
length (14-40 bp) and the limited tolerance of target 
or homing sites for base pair changes. In order to 
better understand molecular determinants of 
homing, we systematically determined the binding 
and cleavage properties of all single base pair 
variant target sites of the canonical LAGLIDADG 
homing endonucleases l-Crel and l-Msol. These 
Chlorophyta algal HEs have very similar three- 
dimensional folds and recognize nearly identical 
22 bp target sites, but use substantially different 
sets of DNA-protein contacts to mediate 
site-specific recognition and cleavage. The site spe- 
cificity differences between l-Crel and l-Msol 
suggest different evolutionary strategies for HE per- 
sistence. These differences also provide practical 
guidance in target site finding, and in the generation 
of HE variants with high site specificity and cleavage 
activity, to enable genome engineering applications. 

INTRODUCTION 

The lateral transfer of mobile introns or inteins into new 
host organisms is termed 'homing' (1^1). The homing 
reaction has three specific components: the laterally 



transferred genetic element, typically a self-splicing 
intron or intein; a homing endonuclease (HE) protein, 
often encoded within the mobile intron or intein; and a 
target or 'homing' site that can be cleaved by a cognate 
HE. Target site cleavage initiates recombination-mediated 
repair using the homologous intron or intein-containing 
donor allele as a repair template. Successful repair results 
in transfer of the mobile intron or intein into the 
cleaved host target site to create a new intron-containing 
(or 'intron+') allele. Intron/intein insertion disrupts 
the target site, and thus prevents additional rounds 
of cleavage that could result in intron loss. Homing 
is thus an efficient, unidirectional lateral transfer mechan- 
ism that ensures the potential for additional rounds of 
intron or intein lateral transfer into HE cleavage-sensitive 
hosts (4). 

Several hundred HEs encoded by mobile introns or 
inteins have been identified across all domains of life. 
These HEs have been categorized into five families on 
the basis of shared sequence motifs: the LAGLIDADG, 
HNH, His-Cys box, GIY-YIG and PD-(D/E)XK HE 
families (5,6). The LAGLIDADG homing endonuclease 
(LHE) family, with several hundred members, is the 
largest and best-studied of these families (7,8). LHE 
proteins share an appapa core fold in which the conserved 
'LAGLIDADG' protein motif forms a dimerization or 
intra- molecular folding interface, and contributes catalyt- 
ic acidic aspartic or glutamic acid residues to the nuclease 
active sites. DNA target site recognition is mediated by 
contacts made by LHE amino acid side chains to DNA 
bases or to the phosphodiester backbone. Most of these 
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interface amino acid residues are located in anti-parallel p 
sheets flanking the LAGLIDADG interface. 

The high site specificity of LHE cleavage reflects the 
large number of phased, direct and water-mediated 
contacts made by LHEs to target site DNA (5). Despite 
their high site specificity, many LHEs appear to tolerate 
some target site base pair changes without a loss of site 
binding or cleavage [see, e.g. (9,10)]. This seemingly para- 
doxical combination — high site specificity with the ability 
to tolerate target site base pair changes — may be evolu- 
tionary advantageous: high site specificity minimizes 
toxicity to current hosts by limiting off-target cleavage, 
while the ability to tolerate target site genetic variation 
may maximize the potential for continued lateral spread 
(11-14). 



In order to better understand the molecular determin- 
ants of homing and lateral transfer, we determined the 
ability of two well-characterized, homologous LHE 
proteins to bind and to cleave all single base pair 
variants of their native DNA target sites. The proteins, 
I-Crel and I-Msol, are encoded by mobile introns in the 
chloroplast DNAs of different Chlorophyta algal species. 
Both homodimeric endonucleases share a common 
three-dimensional fold and 22 bp target sites that differ 
at only 2 of 22 bp positions. The I-Crel and I-Msol 
target sites are pseudo-palindromic, reflecting the 
homodimeric architecture of both endonucleases, and 
have left and right halves that share, respectively, 7 of 
11 and 5 of 11 bases (Figure 1) (15,16). Our prior struc- 
tural analyses demonstrated that I-Crel and I-Msol use 
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Figure 1. I-Crel, I-Msol and their monomeric variants. (A) Crystal structure of I-Crel (green) bound with target site DNA (red) (17). The I-Crel 
target site is shown below the co-crystal structure, with target site nucleotide positions numbered relative to the center of symmetry. Two-fold 
symmetric, palindromic target site positions are shown in shaded boxes, and the location of top and bottom strand cleavage sites by filled arrows. 
mCrel, a monomeric/single chain version of I-Crel, was generated by connecting the two I-Crel domains with a 33 amino acid residue linker whose 
sequence is shown below the target site in single letter amino acid code. The unique portion of the linker, located between flexible GS repeats, is 
underlined (19). (B) Crystal structures of I-Msol (green) and mMsol (cyan) bound with their DNA target site (18,19). The I-Msol target site, 
location of palindromic target site positions and the location of cleavage sites are indicated as in (A) above. The 33 residue linker sequence used to 
create mMsol is indicated below the target site in single letter amino acid code with the unique portion of the linker underlined (19). (C) DNA- 
protein interfaces of I-Crel and I-Msol bound to native DNA target sites. The DNA-protein contact maps for I-Crel (top) and I-Msol (bottom) 
were determined from their respective co-crystal structures (17,18) and redrawn in common format. Protein side chains that are conserved in 
structure and DNA binding function are indicated by yellow ovals. Base pairs at the ± 1 1 positions, shown in pink, differed from native target 
site base pairs and were included in successful crystallization oligonucleotides. 
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substantially different sets of DNA-protein contacts to 
recognize their target sites (Figure 1C) (17,18). In work 
reported here, we systematically determined the in vitro 
binding and cleavage properties of I-Crel, I-Msol and 
the monomeric versions mCrel and mMsol (19) on 
all single base pair variant DNA target sites (Figure 1). 
We also determined the in vivo cleavage specificity profiles 
for mCrel and mMsol in human cells using the same 
target site libraries. 

MATERIALS AND METHODS 

Materials 

The bacterial protein expression plasmids pET15b and 
pET24d were obtained from Novagen (Gibbstown, NJ, 
USA). The Escherichia coli protein expression host 
strain C2566 was obtained from New England Biolabs 
(Ipswich, MA, USA). DNA oligonucleotides (50-nmol 
scale, salt-free) were synthesized by Operon (Huntsville, 
AL, USA). Qiaquick PCR purification kits and Ni-NTA 
HisSorb plates were obtained from Qiagen (Valencia, CA, 
USA). Other reagents, including restriction enzymes, Taq 
DNA polymerase and T4 DNA ligase were obtained from 
New England Biolabs (Ipswich, MA, USA) or 
Sigma-Aldrich (St Louis, MO, USA). 

Protein expression and purification 

Homing endonuclease proteins were expressed and 
purified as previously described by nickel affinity chroma- 
tography (19). Proteins for in vitro binding assays were 
expressed from pET15b with an N-terminal 6 x His 
affinity purification/binding tag. Proteins used for 
in vitro cleavage assays were expressed and purified from 
pET24d without affinity tags. 

In vitro binding specificity 

The in vitro target site binding affinities were determined 
using a competitive binding assay as previously described 
(20). In brief, proteins with N-terminal 6 x His tags were 
immobilized in Ni-NTA HisSorb 96-wells plates (Qiagen) 
by incubating 200 ul of 100 nM protein in TBS/BSA buffer 
(50 mM Tris-HCl pH 7.5, 150mM NaCl, 0.2% BSA) with 
plate wells for 2 h at room temperature. Unbound protein 
was removed by washing plates four times with TBS/ 
Tween-20 [50 mM Tris-HCl (pH 7.5), 150 mM NaCl, 
0.05% Tween-20]. Immobilized proteins were then 
incubated with a mixture of lOOnM fluorescently labeled 
native target site DNA and 3 uM (a 30-fold excess) of each 
of three unlabeled competitor target site DNAs containing 
alternative base pair substitutions at a given target site 
base pair position in 200 ul of binding buffer [50 Mm 
Tris, pH 7.5, 150mM NaCl, 0.02mg/ml poly(dl-dC), 
10 mM CaCl 2 ]. After incubation for 4h at room tempera- 
ture, plates were washed four times with TBS (50 mM 
Tris, pH 7.5, 150mM NaCl). Retained fluorescence was 
quantified on a SpectraMax® M5/M5e microplate reader 
(Molecular Devices; excitation, 510 nm; emission, 565 nm; 
cutoff, 550 nm). All measurements were performed in 



triplicate, and relative in vitro binding affinities were 
calculated using the following formula: 

Relative binding affinity = 

[(F(n) - F(x)) x F(t)]/[(F(n) - F(t)) x F(x)] 

where F{x), F(t) and F(n) indicate fluorescence intensities 
of wells containing immobilized protein that were 
incubated with unlabeled base substitution target sites 
[F(x)] or with the native [F(t)] or a scrambled sequence 
target site [F(n)]. 

In vitro cleavage specificity 

I-Crel or I-Msol target site were synthesized as 
pairs of complementary DNA oligonucleotides which 
were annealed and ligated into the Xhol and Sacl sites 
of the recombination reporter plasmid pDR-GFP-univ, 
a universal target site version of pDR-GFP 
(19,21) (http://depts.washington.edu/monnatws/plasmids/ 
pDR-GFP-univ.pdf)- The I-Crel site library consisted of 
61 unique target site plasmids: a native I-Crel target site 
plasmid and 60 additional plasmids with single base pair 
substitutions covering target site positions —10 to +10. 
The corresponding I-Msol site library consisted of 67 
unique target site plasmids: a native I-Msol target site 
plasmid and 66 additional plasmids with single base sub- 
stitutions covering target site positions —11 to +11. Each 
target site was amplified from pDR-GFP-univ plasmid 
DNA using pairs of primers that placed the target site at 
the center of different-sized amplicon products. 
Amplicons harboring target site A substitutions were 
2200 bp long, and those with C, G or T substitutions 
were, respectively, 1900, 1600 or 1320 bp long (Figure 
3A). After purification, equimolar amounts of the four 
DNA substrate fragments were combined to generate 20 
I-Crel (or 22 I-Msol) substrate pools for competitive 
in vitro cleavage assays. 

HE proteins for in vitro cleavage assays were expressed 
in E. coli host strain C2566 from pET24d and purified as 
previously described (19). In vitro cleavage assays were 
conducted in 20 mM Tris pH 8.0, lOmM MgCl 2 with 
1:1 protein/DNA ratio under conditions where ~50% of 
the wild-type target site was cleaved. This corresponded to 
15min at 37°C for I-Crel/mCrel, and lh at 37°C for 
I-MsoI/mMsoI. Digests were stopped by adding loading 
buffer containing 0.1% SDS to samples, and the ladders 
of substrate and product fragments from each digest 
were separated by agarose gel electrophoresis. Fragment 
band intensities were quantified after staining using 
ImageQuant (Molecular Dynamics). Target site cleavage 
efficiency was calculated from the ratio of substrate to 
product bands, normalized to the cleavage efficiency of 
the native base pair at each target site position to 
generate a relative in vitro cleavage index. 

In vitro competitive cleavage assay 

The plasmid substrate for in vitro competitive cleavage 
assays were constructed by cloning both a native and a 
test target site into pCcdB (22) at two different locations: 
the native target site into ^4/7III/S^/II-cleaved plasmid 
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DNA, and the test target site into Nhel/ Sacll-cleaved 
plasmid DNA. In order to perform competitive cleav- 
age assays, plasmid substrates were linearized by Xbal 
digestion, and lOOng of linear plasmid substrate was 
incubated with LHEs in the presence of 20 mM Tris pH 
8.0, 100 mM NaCl, 10 mM MgCl 2 for lh at 37°C. 
Cleavage reactions were quenched by adding lOmM 
EDTA and 1% SDS followed by heating for lOmin at 
60°C. Plasmid substrate and cleavage products were 
separated by agarose gel electrophoresis, visualized by 
staining with ethidium bromide and photographed for 
quantitation. 

In vivo cleavage/recombination assays 

The ability of mCrel and mMsol to cleave target sites in 
human cells was determined using a human cell transient 
transfection assay as previously described [Figure 4A; 
(19)]. In brief, 293 T cells (3 x 10 5 cells/well in 500 ul of 
growth medium) were plated in 24-well plates 1 d prior 
to transfection and grown in a 37°C humidified, 5% C0 2 
incubator. Complete growth medium consisted of 
Dulbecco-modified Eagle's medium (Cellgro) supple- 
mented with 10% fetal bovine serum (Cellgro) and 1% 
penicillin/streptomycin (Gibco). Wells were 50-80% con- 
fluent at the time of co-transfection with a pDR-GFP-univ 
target site plasmid and a coding plasmid for either mCrel 
or mMsol . Transfections contained a total of 1.5 ug 
plasmid DNA/well (3:1 molar ratio of expression to 
target site plasmid DNA), and were performed overnight 
at 3% C0 2 using a modified calcium phosphate protocol 
(19,23). 

Transfected cells were analyzed by flow cytometry 48 h 
after transfection to quantify the frequency of generation 
of cleavage-depenent GFP+ recombinant cells. In brief, 
cells were tryspinized and washed in PBS, resuspended 
at ~10 6 cells/ml in PBS buffer and then stained with 
propidium iodine (lOng/ul) prior to analysis on an 
Influx flow cytometer (Cytopeia). Typically 50 000 events 



were scored for each transfected sample by gating log side 
versus linear forward scatter and for PI exclusion to 
quantify the fraction of GFP+ viable cells. Transfection 
efficiency was monitored in all experiments by using a 
GFP+ positive control vector pEGFP-Cl (Clontech) in 
the same experiment. In vivo cleavage efficiencies for 
single base pair variant target sites were calculated from 
the percent GFP+ cells, corrected for transfection effi- 
ciency and normalized against the GFP+ frequency of 
the native base pair at each target site position. 

RESULTS 

In vitro binding specificity 

Target site binding affinities for all four proteins 
were determined using a competitive binding assay (20). 
In brief, N-terminal 6 x His-tagged HE proteins were 
immobilized in 96-well plates, then incubated with a 
fluorescently labeled native target site oligonucleotide 
followed by a 30-fold molar excess of unlabeled competi- 
tor target site DNA. Competitor sites included the native 
target site, single base pair variant target sites or an unre- 
lated sequence as a control for non-specific binding. 
Relative binding affinities were calculated from competi- 
tor concentration-specific loss of fluorescence. Figure 2 
displays the in vitro binding profiles for the native 
homodimers I-Crel and I-Msol. 

The 10 bp positions in the I-Crel target site (±3-5 and 
±9-10) that were palindromically symmetric between the 
left and right half sites displayed a strong preference to 
bind native target site base pairs, and greatly reduced 
affinities (<10% of native) for each of the other 3 bp 
(Figure 2, top panel). Base pairs at these positions make 
multiple direct or water-mediated contacts (17,18). In 
contrast, the central four target site positions (—2 to ±2) 
that make no base-specific contacts with I-Crel were able 
to bind 1 or 2 bp in addition to the native base pair with 
affinities of >50% that of the native base pair. Seven of 
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Figure 2. In vitro binding specificity profiles of I-Crel and I-Msol. The relative binding affinities of I-Crel (top) and I-Msol (bottom) for all 4 bp at 
each target site position were determined using a competitive in vitro binding assay (20). Target site base pair numbering is as in Figure 1, with 
positions differing between I-Crel and I-Msol shown in boxes (—9 and +10 positions). Bar heights indicate the binding affinity of each protein for 
base pair substitutions relative to binding of the native base pair whose binding has been arbitrarily set as 1.0. All results are the mean of three 
replications in which the standard deviation between experiments was ±5%. Error bars have been omitted for graphical clarity. 
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Figure 3. In vitro cleavage specificity profiles of I-Crel and I-Msol. (A) Outline of the 'bar code' cleavage assay used to assess the comparative 
cleavage efficiency of target sites with base pair substitutions (see Methods). Hypothetical base-specific or non-specific cleavage patterns are depicted 
at right (25). (B) Agarose gels displaying in vitro cleavage specificity profiles for I-Crel and I-Msol determined as described in panel (A). 
(C) Quantitation of cleavage specificity data in (B), where bar heights indicate extent of cleavage of base pair substitutions relative to the native 
base pair whose cleavage efficiency has been arbitrarily set to 1.0. All results are the mean of three replicates in which the standard deviations were 
±5%. Error bars have been omitted for graphical clarity. 



the eight remaining I-Crel target site positions ( ± 6-7, +8 
and ±11) each bound at least one variant base pair with 
affinities of >50% that of the native base pair (Figure 2, 
top panel). Only 7 of 66 single base pair substitutions in 
the I-Crel target site displayed binding affinities compar- 
able to the native base pair (— 6C>T, +1G>A, 
+2A>C or T, +6A>G, +8T>A and +11G>A, 
Figure 2, top panel). Of note, many base pair variants 
with high binding affinities increased the overall 
symmetry of the I-Crel target site (see, e.g. — 7A > C 



and +7G>T; -6C>T and +6A>G, Figure 2, top 
panel). 

The in vitro binding profile of I-Msol differed substan- 
tially from I-Crel and was less specific. I-Msol had only 
4 bp positions (versus I-Crel's 10), positions —9, ±5 and 
—3, with a strong preference for binding only the native 
base pair (Figure 2, bottom panel). FMsol was also more 
tolerant of base pair changes in the central four target site 
positions (—2 to +2), where all three alternative base pairs 
could be bound with affinities ranging from 30 to 90% of 
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Figure 4. The in vivo cleavage specificity profiles of mCrel and mMsol. (A) Assay used to measure in vivo cleavage efficiency of variant target sites in 
human cells, where target site cleavage leads to the generation of GFP+ recombinant cells. (B) In vivo cleavage specificity profiles of mCrel and 
mMsol, determined as described in A by co-transfecting endonuclease coding and target site plasmids into human 293T cells. Relative cleavage 
efficiencies are plotted as in Figure 3 with the native base pair GFP+ value arbitrarily set to 1.0. All assay values represent the mean of three 
replications, with error bars omitted for graphical clarity. 



the native base pair (Figure 2, bottom panel). Again, base 
pair substitutions that increased overall target site 
symmetry, e.g. at positions +6 and +9, increased site 
binding affinity (Figure 2, bottom panel). 

A conspicuous difference between I-Msol and FCrel 
was site binding symmetry: the left I-Msol half site (—3 
to —11) displayed higher binding specificity than did 
I-Crel, whereas the right I-Msol half site had six positions 
(+6 to +11) where one or more variant base pair was 
bound with >50% of the affinity of the native base pair. 
This difference in I-Msol half site binding affinities agrees 
with our prior analysis of the I-Msol DNA binding 
thermodynamic profile (24). 

We used these site binding data to calculate global 
binding specificities for I-Crel and I-Msol. The binding 
specificity of I-Crel was ~1.8 x 10~ 10 , which was 
calculated by dividing the number of variant target sites 
that were bound with >50% of native site affinity 
(n = 3072) by the number of unique target sites of 
length 22bp (n = A 22 = 1.8 x 10 13 ). The corresponding 
binding specificity of I-Msol was approximately an 
order of magnitude lower, or ~1.6x 10~ 9 . The corres- 
ponding binding specificity profiles of mCrel and 



mMsol closely resembled their respective parental 
proteins (Supplementary Figure SI), although their 
calculated binding specificities were lower: ~1.6xl0~ 9 
for mCrel, and ~2.0 x 10~ 6 for mMsol. This difference 
may reflect the presence of the 33 residue linkers inserted 
to monomerize I-Crel and I-Msol (Figure 1) (19), and/or 
the presence of two His tags on the subunits of the 
homodimeric proteins as opposed to the single tag on 
each corresponding monomeric protein in binding assays. 

In vitro cleavage specificity 

In vitro cleavage specificities were determined using a 
single tube, competitive 'bar code' cleavage assay to sim- 
ultaneously assess the cleavage sensitivity of all 4 bp 
possibilities at each target site position (Figure 3A) (25). 
Target site libraries were constructed in pDR-GFPuniv 
(http://depts.washington.edu/monnatws/plasmids/pDR- 
GFP%20univ.pdf) to permit the same target site libraries 
to be used for in vitro and in vivo cleavage specificity de- 
terminations (Figure 4; see below). In vitro cleavage con- 
ditions were determined to ensure ~50% cleavage of the 
native target site at a 1:1 protein:DNA ratio. This ratio 
was chosen to minimize the effect of binding affinity 
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differences upon cleavage. The results of 22 competitive 
cleavage reactions that simultaneously assayed all 4 bp 
possibilities at each base pair position were then displayed 
on a single gel for quantitation (Figure 3B). 

I-Crel displayed a strong preference for native base 
pairs at many target site positions (±1 to ±10) in 
cleavage assays (Figure 3B and C, top panels). The 
highest cleavage specificities were observed at target site 
positions ±3^1 and ±9-10, and the lowest specificities at 
positions ±1, 2 and 8. Only 5 bp substitutions were 
cleaved more efficiently than the native base pair: 
-7A>C, -2G>A, +1G>A, ±7G > T and ±8T>A. 
Two of these substitutions, — 7A > C and ±7G > T, 
increased the overall symmetry of the I-Crel/mCrel 
target site. 

I-Msol displayed the highest cleavage specificity at 
target site positions ±3 and ±5, and was least specific 
at target site positions —1 and +7, 8 and 11 (Figure 3B 
and C, bottom panels). At positions —1 and +11, I-Msol 
cleaved all 4 bp possibilities with equal efficiency. A total 
of 14 bp substitutions at seven target site positions were 
cleaved, as well as the corresponding native base pair: 
-7A>T, -1T>A/C/G, +2A>C, +7G>A/C/T, 
+8T> A/C, +10C>T and +11G>A/C/T. Four of these 
substitutions, - IT > C, +2A > C, +7G > T and +10C > T, 
increased the overall symmetry of the 1-MsoI target site 
(Figure 3C, bottom panel). 

The global cleavage specificities of I-Crel and I-Msol 
were calculated from the number of variant target sites 
that could be cleaved with >50% of the efficiency of the 
native site, divided by the number of unique 20 bp (I-Crel) 
or 22 bp (I-Msol) target sites. These cleavage specificities 
were -1.4 x 10" 8 for I-Crel and -5.4 x 10" 5 for I-Msol. 
The corresponding in vitro cleavage specificity profiles for 
mCrel and mMsol were very similar to I-Crel and I-Msol: 
respectively ~2.8 x 10~ 8 and ~2.4 x 10~ 5 (Supplementary 
Figure S2). The higher global binding and cleavage 
specificities of I-Crel versus I-Msol can be seen easily in 
relative binding and cleavage difference plots that 
compare the two endonucleases (Supplementary 
Figure S3). 

In vitro cleavage of target sites with multiple base pair 
changes 

In order to determine whether single base pair cleavage 
data could be used to predict the cleavage sensitivity of 
target sites having multiple base pair changes, we analyzed 
36 different mCrel target sites containing from 3 to 9 bp 
differences from the native I-Crel target site (Figure 1). 
An explicit example from these analyses is shown in 
Figure 5, in which our mCrel cleavage specificity profiles 
were used to search for engineerable target sites in the 
human SBDS gene to target to catalyze gene repair. 

SADS-inactivating mutations cause Shwachman- 
Diamond syndrome (SDS), a rare, heritable bone 
marrow failure syndrome characterized by congenital 
abnormalities, hematopoietic failure and cancer predis- 
position (26). The human SBDS CHS2 mCrel target site 
is located in SBDS intron 1, upstream of the location of 
>90% of SDS-causing SBDS mutations (27). The CHS2 



SBDS site differs at 9 bp positions from the native I-Crel 
target site (Figure 5A). Our in vitro cleavage data and a 
prior systematic protein computation design analysis of 
mCrel (25) indicated that four of these base pair differ- 
ences should be recognized and cleaved by native I-Crel/ 
mCrel (—8, —1, +1 and +2), and an additional 3 bp 
changes at positions —9, —7 and —5 could be successfully 
targeted by previously identified mCrel protein computa- 
tional designs (25). The remaining 2 bp differences, at pos- 
itions — 1 and +7, were predicted to reduce cleavage. 

Cleavage analysis of CHS2 site variants containing dif- 
ferent combinations of these base pair changes allowed us 
to verify the predictions of cleavage sensitivity for the 
combined base pair differences at positions —8, —1, +1 
and +2, and that substitutions at —1 and +7 reduced 
though did not abolish cleavage. Similar analyses of 35 
additional target sites chosen on the basis of cleavage de- 
generacy data and engineerability with from 3 to 7 bp dif- 
ferences from the native I-Crel target site revealed that a 
majority (31/35, or 89%) predicted to be cleavage-sensitive 
from our single base pair scanning data were 
cleavage-sensitive, and that 11 of these sites (31%) were 
cleaved with efficiencies comparable to the native I-Crel 
target site. Many target sites with up to three contiguous 
base pair changes, each having relative cleavage activities 
of >0.5 versus the native site were cleavage-sensitive as 
predicted, whereas target sites having four or five contigu- 
ous substitutions where one or two substitutions had 
relative cleavage activities of <0.5 in our single base pair 
scan data were largely cleavage-resistant (additional 
results not shown). 

Using single base pair scan data to predict the potential 
for evolutionary spread 

We also used our single base pair cleavage data to gauge 
the potential of I-Crel or I-Msol for lateral transfer to 
additional organisms to identify target site variants that 
retained 28S rRNA secondary structure motifs required 
for function (the extrahelical +6A base and a paired 
stem-loop structure; Figure 6A), and were predicted to 
be highly cleavage sensitive from our target site scan 
results. The I-Crel site predicted to be the most 
cleavage-sensitive by these criteria had — 7C and +8G 
base pair substitutions. Blastn searches using this site 
identified 75 perfect matches in nucleotide sequence data- 
bases, of which 55 were in LSU ribosomal RNA genes. 
Two I-Msol sites with predicted higher cleavage sensitiv- 
ity, in contrast, did not have perfect matches that could be 
identified by Blast searching (Supplementary Table SI; 
additional results not shown). 

In vivo cleavage specificity 

In light of growing interest in using HE proteins for 
genome engineering and gene therapy, we also determined 
the cleavage specificities of mCrel and mMsol in human 
cells. These experiments used the same target site libraries 
constructed for in vitro cleavage specificity experiments 
(Figure 3). Target site cleavage of site plasmids in vivo 
was quantified from the frequency of cleavage-dependent 
generation of recombinant, GFP+ cells (Figure 4A) (19). 
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Figure 5. In vitro cleavage specificity profiling guides the generation of target site-specific LHE variants. (A) The workflow for engineering mCrel 
towards novel target sites, using the human Shwachman-Diamond syndrome gene SBDS CHS2 site as an example. The first step is to predict the 
cleavage sensitivity of target sites containing multiple base pair changes from single base pair cleavage sensitivity profiling data (see, e.g. Figure 2). 
These predictions can be experimentally verified in a second step, and then combined with LHE protein designs to generate a target site-specific LHE 
variant. The activity and specificity of this novel target site-specific variant can then be further improved by a combination of selection or screening. 
(B) Schematic overview of in vitro competitive cleavage assay. Both native and novel target sites are cloned into a plasmid that is linearized prior to 
LHE cleavage. Cleaved products are visualized on an agarose gel to determine the relative cleavage efficiency of the native and test sites from relative 
band intensities. (C) Agarose gels displaying in vitro cleavage efficiency of mCrel on three CHS2 sites that contain different combinations of base pair 
changes. 



In vivo cleavage specificity profiles for mCrel and mMsol 
closely resembled those determined in vitro (Figures 3C 
and 4B). One conspicuous difference was the overall 
higher specificity of mMsol cleavage in vivo at right half 
site positions +7 to +11. In contrast, mCrel in vitro and 
in vivo cleavage specificity profiles closely resembled one 
another (Figures 3C and 4B). 

DISCUSSION 

Relationship of in vitro binding and cleavage specificities 

We observed a strong correlation between binding and 
cleavage at many I-Crel and I-Msol target site positions. 
Most substitutions that reduced binding also comparably 
reduced cleavage efficiency (Figures 2 and 3). Of greater 
interest were base pair substitutions that disproportion- 
ately affected binding or cleavage: these substitutions 
may provide insight into dynamic aspects of binding and 
cleavage complex formation. Four I-Crel site 



substitutions substantially reduced binding, but retained 
>50% of the cleavage activity of the native target site 
(-8A>C or G, -5G>T and -2G>A). One substitu- 
tion, +2A > T, displayed native binding affinity but no 
detectable cleavage activity. I-Msol had 15 bp substitu- 
tions that disproportionately reduced binding versus 
cleavage. One I-Msol substitution, +9T > C, reduced 
cleavage by ~40%, while enhancing binding to > 100% 
(Figures 2 and 3). These 'uncoupling' base pair substitu- 
tions are easy to identify in difference plots that compare 
binding and cleavage activity at each target site base pair 
position (Supplementary Figure S4). 

Base pair substitutions that selectively affect binding 
but not cleavage may allow transition state complexes to 
be formed that include new stabilizing interactions, or that 
do not depend on stabilizing interactions in the ground 
state. Conversely, base pair substitutions that selectively 
affect cleavage but not binding might create interactions 
that stabilized the ground state, or that hindered conform- 
ational changes required to form a transition state 
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Figure 6. Location of I-Crel and I-Msol target sites in host ribosomal RNA genes. (A) The secondary structure of domain V in E. coli 23S rRNA 
gene, in which the n.2593 target site region recognized by I-Crel and I-Msol is shown in red and the intron insertion site indicated by an arrow. The 
I-Crel and I-Msol target sites are aligned below the stem-loop to indicate intron insertion sites (down arrow), strand cleavage sites (filled arrow 
heads) and the location of the extrahelical base at the +6 position that plays a role in peptide release (dot) (38). (B) Aligned sequences of the 
corresponding n.2593 target site region from LSU rRNA genes from chloroplast or mitochondrial 23S ribosomal RNA genes of green algae, with the 
E. coli 23S ribosomal RNA gene shown at bottom (33). Target sites are surrounded by a red box, with the central four positions (—2 to +2) shaded 
in magenta. Positions conserved across all target sites are underlined with an asterisk. 



complex. It may be possible to discriminate among these 
models by determining the predicted binding energies or 
structures of specific 'uncoupling' substitutions captured 
in both pre-cleavage and cleavage complexes. 

An important determinant of the higher overall specifi- 
city of I-Crel is the larger number of direct and 
water-mediated contacts made with target site DNA: I- 
Crel makes an average of 2.4 contacts (direct or water- 
mediated) with each target site position, whereas I-Msol 
makes an average of only 1.7 contacts/bp (18). The most 
specific target site positions in both I-Crel and I-Msol, e.g. 
the ±3-5 target site positions (Figures 2 and 3), share 
many conserved DNA-protein contacts that may be 
required in both proteins to correctly position target site 
DNA to promote cleavage (Figure 1C) (16,18). Additional 
DNA-protein contacts at positions more distant from the 
active sites, e.g. positions ±6-11, help ensure high site 
binding affinity and sequence-specific discrimination. 

Target site symmetry also plays an important role in 
determining overall site specificity and, as we discuss 
below, is likely to be an important constraint on both 
lateral transfer and HE evolution. The importance of 
site symmetry is most clearly revealed by base pair substi- 
tutions that create a higher degree of I-Crel or I-Msol 
target site symmetry: these substitutions almost invariable 
lead to enhanced target site binding and/or cleavage [see, 
e.g. (24)]. 

Site specificity profiling enables HE-mediated genome 
engineering applications 

HE site specificity profiles provide useful information 
to guide the generation of HE variants for genome engin- 
eering (6,28-31). Position-specific search or scoring 
matrices (PSSMs) can be constructed from profiling 
data such as those presented in Figure 2, and used 



to identify gene-specific or genomic target sites that 
have a high likelihood of being bound or cleaved by 
specific HEs. The functional consequences of base pair 
differences including SNP variants in 'near match' sites 
can be predicted from profiling data, as shown in 
Figure 5. The most important target site positions on 
which to focus specificity engineering efforts can be 
identified early, as can genomic target sites where there 
are few or no DNA contacts to modify to achieve higher 
or altered specificity (e.g. in the central four target site 
positions —2 to ±2). Target sites likely to be confounded 
by low specificity or the potential for substantial off-target 
cleavage activity can also be identified and avoided 
where there are better alternatives. These results provide 
a good example of how single base pair profiling data can 
be used to determine engineering feasibility, and focus 
protein engineering on specific base pair positions where 
protein engineering is required to achieve new site 
specifity. 

I-Crel and I-Msol site specificity versus host target gene 
structure 

Many HE ORFs are found as open reading frames in large 
or small subunit ribosomal RNA genes (the LSU/23/25/ 
28S and SSU/16S/18S rRNA genes) (16,32). The native 
I-Crel/I-Msol LSU target site resides in a highly 
conserved segment of the chloroplast LSU genes of 
Chlamydomonas and Monomastix (LSU n.2593, where nu- 
cleotide numbering is keyed to the reference Escherichia 
coli LSU 23S ribosomal RNA gene sequence) (33). The 
corresponding portion of LSU rRNA is located in the 
central loop of domain V that includes the peptidyl trans- 
ferase center (Figure 6A) (34-36). Nucleotides flanking 
the LSU n.2593 insertion site display 2-fold symmetry, 
and form a stem-loop structure in rRNA secondary 
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structure models. The stem in structural models is formed 
by base pairs at nucleotide positions n. 2588-2594 (corres- 
ponding to Cre/Mso target site positions —9 to —3) and 
n. 2599-2606 [corresponding to target site positions +3 to 
+ 10; Figures 1 and 6; (37)]. The four nucleotides between 
the two-half sites, n. 2595-2599, form an unpaired loop in 
rRNA that corresponds to the central four nucleotides in 
the Cre/Mso target site (positions —2 to +2, Figures 1 
and 6A). The A residue at position n.2602, corresponding 
to position +6 within the Cre/Mso target site, is 
extrahelical in RNA secondary structure models and 
has been shown to be essential for ribosomal peptide 
release (38). 

The n.2588-2606 stem-loop region of LSU rDNA thus 
provides a well-defined and highly conserved target for the 
lateral transfer of HE-encoding mobile introns. The 
binding and cleavage specificity profiles of I-Crel and 
I-Msol reflect and exploit these LSU target site con- 
straints. LSU bases that form the stem-loop structure in 
LSU rRNA correspond to Cre/Mso target site positions 
-9 to -3, +3 to +5 and +7 to +10 (Figure 6A), where 
I-Crel and I-Msol display high binding and cleavage spe- 
cificity (Figures 2 and 3). The central 4 bp positions (—2 to 
+2) in both target sites, in contrast, are located in a loop 
with few or no apparent sequence constraints in rRNA 
secondary structure models, and these positions con- 
tribute little target site binding or cleavage specificity 
(Figures 2 and 3). 

Implications for HE evolution 

The near-perfect 2-fold symmetry of the n.2593 LSU 
target site is dictated by rRNA functional constraints. 
These constraints, in turn, may strongly influence HE 
protein evolution at several levels. One potential advan- 
tage of using a highly conserved, largely symmetric target 
site for homing is that symmetric sites can be effectively 
targeted by small, homodimeric HE proteins encoded by a 
single, short open reading frame. This permits homing to 
be mediated by the lateral transfer of a small open reading 
frame and accompanying intron or intein that is easily and 
reliably transferred. Small mobile intron/intein open 
reading frames also present a small target for potentially 
inactivating mutations. 

Duplication or duplication and fusion of an open 
reading frame encoding a homodimeric LHE subunit 
opens up another evolutionary opportunity: the ability 
to target asymmetric, degenerate or non-palindromic 
target sites. This strategy can be glimpsed in the structure 
of I-Msol, a symmetric homodimeric LHE which uses 
asymmetric contacts to recognize a target site with a 
high degree of target site symmetry (18). Another particu- 
larly instructive example is I-Ceul, an asymmetric, 
homodimeric LHE from Chlamydomonas eugametos that 
uses unique structural elaborations on the core LHE fold 
to cleave the highly asymmetric n.1923 LSU target site in 
C. eugamotes chloroplast DNA. Of note, I-Ceul retains 
cleavage activity on symmetric-left or symmetric-right 
target sites (39). The ability to cleave related symmetric 
and asymmetric target sites could broaden the range of 



potential LHE hosts to include organisms with related 
asymmetric, as well as symmetric, target sites. 

The substantially different structural solutions used by 
I-Crel, I-Msol and I-Ceul to target LSU sites with differ- 
ing degrees of asymmetry suggest two different evolution- 
ary strategies that may ensure the evolutionary persistence 
of HE proteins and their encoding selfish DNA elements. 
I-Crel, with a rich set of DNA-protein contacts, can dis- 
criminate between closely related target sites (17,18,39). 
A potential advantage of this higher site specificity is the 
ability to evolve higher cleavage activity to aid lateral 
transfer, without substantially increasing cleavage- 
dependent host toxicity (19). I-Msol and I-Ceul, in 
contrast, may be able to spread to a wider range of new 
hosts by virtue of less stringent target site sequence 
requirements. The potentially deleterious consequences 
of lower site specificity may be offset by lower cleavage 
activity, as is the case for I-Msol. Either of these strategies 
for coupling site recognition specificity and cleavage 
activity could represent a viable — or preferred — strategy 
for lateral transfer and evolutionary persistence in host 
populations with differing degrees of target site sequence 
divergence. 

Host accommodation following lateral transfer repre- 
sents another important determinant of HE evolution. 
Several strategies for host accommodation have been 
identified among HEs. These include use of an 
HE-encoded maturase function to improve the expression 
of host genes; host genetic code adoption and the use of 
host codon preferences to improve HE expression; and 
modulation of HE protein expression to ensure adequate 
expression of both host gene and HE open reading 
frame-encoded proteins (6,40,41). It should be possible 
to experimentally determine the contribution of these de- 
terminants of HE lateral transfer using the systems that 
have been developed to study homing and LAGLIDADG 
HEs in prokaryotes (22,42,43), single cell eukaryotes 
(44,45), and most recently metazoans (46,47). 

SUPPLEMENTARY DATA 

Supplementary Data are available at NAR Online: 
Supplementary Figures S1-S4 and Supplementary Table SI. 
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